Evaluation of a Japanese CFG Derived from a Syntactically Annotated Corpus with Respect to Dependency Measures

نویسندگان

  • Tomoya Noro
  • Chimato Koike
  • Taiichi Hashimoto
  • Takenobu Tokunaga
  • Hozumi Tanaka
چکیده

Parsing is one of the important processes for natural language processing and, in general, a large-scale CFG is used to parse a wide variety of sentences. For many languages, a CFG is derived from a large-scale syntactically annotated corpus, and many parsing algorithms using CFGs have been proposed. However, we could not apply them to Japanese since a Japanese syntactically annotated corpus has not been available as of yet. In order to solve the problem, we have been building a large-scale Japanese syntactically annotated corpus. In this paper, we show the evaluation results of a CFG derived from our corpus and compare it with results of some Japanese dependency analyzers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Large-Scale Japanese CFG Derived from a Syntactically Annotated Corpus and Its Evaluation

Although large-scale grammars are prerequisite for parsing a great variety of sentences, it is difficult to build such grammars by hand. Yet, it is possible to build a context-free grammar (CFG) by deriving it from a syntactically annotated corpus. Many such corpora have been built recently to obtain statistical information concerning corpus-based NLP technologies. For English, it is well known...

متن کامل

Building a Large-Scale Japanese CFG for Syntactic Parsing

Large-scale grammars are a prerequisite for parsing a great variety of sentences, but it is difficult to build such grammars by hand. Yet, it is possible to derive a context-free grammar(CFG) automatically from an existing large-scale, syntactically annotated corpus. While being seemingly a simple task at first sight, CFGs derived in such a fashion have hardly ever been applied to an existing s...

متن کامل

Syntactically annotated corpora of Estonian

Syntactically annotated corpora are needed 1) to train and test parsers and various language technological products grammar checkers, information retrievers and extractors, machine translators etc; 2) to check the agreement of existing linguistic theories with the real language usage. The corpora can be annotated on different levels of depth. In shallow syntactically annotated corpora a syntact...

متن کامل

Mining Syntactically Annotated Corpora with XQuery

This paper presents a uniform approach to data extraction from syntactically annotated corpora encoded in XML. XQuery, which incorporates XPath, has been designed as a query language for XML. The combination of XPath and XQuery offers flexibility and expressive power, while corpus specific functions can be added to reduce the complexity of individual extraction tasks. We illustrate our approach...

متن کامل

Using a Partially Annotated Corpus to Build a Dependency Parser for Japanese

We explore the use of a partially annotated corpus to build a dependency parser for Japanese. We examine two types of partially annotated corpora. It is found that a parser trained with a corpus that does not have any grammatical tags for words can demonstrate an accuracy of 87.38%, which is comparable to the current state-of-the-art accuracy on the Kyoto University Corpus. In contrast, a parse...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005